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I. INTRODUCTION 

Stochastic nonlinear dynamical models are widely used 
in studying complex (natural as well as man-made) phe- 
nomena; examples range from molecular motors [l| and 
semiconductor lasers to epidemiology Q and coupled 
matter-radiation systems in astrophysics Q. Accord- 
ingly, much attention has been paid in the statistical 
physics community to the central problem of reconstruct- 
ing (i.e., inferring) stochastic nonlinear dynamical models 
from noisy measurements (see, e.g., |5, 6]). The chief dif- 
ficulty here stems from the fact that, in a great number of 
important problems, it is not possible to derive a suitable 
model from "first principles," and one is therefore faced 
with a rather broad range of possible parametric models 
to consider. Furthermore, experimental data can some- 
times be extremely skewed due to the intricate interplay 
between noise and nonlinearity, making it very difficult 
to extract from data important "hidden" features (e.g., 
coupling parameters) of a model. 

Although no general method exists for inferring the pa- 
rameters of stochastic nonlinear dynamical models from 
measurements, various schemes have been proposed re- 
cently l2,iaii|ll|ll|ll|ll[ll[l3to deal with differ- 
ent aspects of this "inversion" problem. An important 
numerical technique, suggested in 0,0,0], is based 
on estimating drift and diffusion coefficients at a number 
of points in the phase space of the dynamical system. 
This technique was extended further in to handle 
both dynamical and measurement noise. In principle, 
this approach allows subsequent use of the least-squares 
method for the estimation of the model parameters. Such 
an empirical approach, however, requires a considerable 
amount of data and an intensive computational effort 
even for a simple stochastic equation. A more general, 
and efficient, theoretical approach is therefore very desir- 
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able. 

Arguably the most general approach to the solution of 
this problem is furnished by Bayes' theorem ^8, IQ^ 16|. 
Indeed, it was shown in that the Bayesian method pro- 
vides a rigorous and systematic basis for heuristic modi- 
fications made earlier Q to the least-squares method to 
enable its use on noisy measurements. The Bayesian ap- 
proach was employed in Q to estimate levels of dynami- 
cal and measurement noise for a known dynamical model. 
The Bayesian method has also been used for parameter 
estimation in maps in the presence of dynamical ^ and 
weak measurement ^3 noise. Finally, an application of 
the Bayesian method to continuous systems was consid- 
ered in [llj]. 

A common drawback of these earlier works is their 
exclusive reliance on numerical methods for the opti- 
mization of cost functions and the evaluation of multi- 
dimensional normalization integrals encountered in the 
theory. This disadvantage becomes increasingly more 
pronounced when systems with ever larger numbers of 
unknown parameters are investigated. Another major 
deficiency is that most of the earlier works deal with 
discrete maps, and the corresponding results are there- 
fore not immediately applicable to continuous systems, 
since the transformation from noise variables to dynami- 
cal variables is different in discrete and continuous cases. 
Specifically, as will be shown below, a prefactor account- 
ing for the Jacobian of the transformation must be in- 
cluded in the likelihood function in the continuous case. 
Such a prefactor was considered in in the context of 
Bayesian inference for continuous systems; however, an 
ad hoc likelihood function was used there instead of the 
correct form derived here. 

In this paper, we introduce a new technique for 
Bayesian inference of stochastic nonlinear dynamical 
models from noisy measurements. At the core of our algo- 
rithm is a path-integral representation of the likelihood 
function that yields the correct form for the Jacobian 
prefactor. This term provides optimal compensation for 
the effects of dynamical noise, thus leading to robust in- 
ference for a broad range of dynamical models. Another 
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key feature of the approach is a novel parameterization 
of the vector "force" field, which permits an analytical 
treatment of the inference problem, thus obviating the 
need for extensive global optimization. These improve- 
ments lead to an efficient and accurate algorithm for re- 
constructing from time-series data models of stochastic 
nonlinear dynamical systems with large numbers of un- 
known parameters. 

The paper is organized as follows. The general for- 
mulation of the problem and its analytical solution are 
presented in Section^ The algorithm is then applied in 
Section lill Al to data from the stochastic Lorenz system, 
and its performance is compared with those of earlier 
research. The advantages of the present method are fur- 
ther illustrated in Section Fill Bl bv inferring a model for 
a system of five globally- and locally-coupled noisy oscil- 
lators. Finally, the results are discussed and conclusions 
are drawn in Section ITVI 



II. THEORY OF RECONSTRUCTION OF 
STOCHASTIC NONLINEAR DYNAMICAL 
MODELS 

A. Problem description 

We envision a typical experimental situation where the 
stochastic trajectory x(i) of a dynamical system is mea- 
sured at sequential time instants n — 0, 1, ... , N}, 
and a set of data y = {y„ = y(irt)} is thus obtained. 
For instance, x{t) may represent the coordinates of a 
molecular motor progressing along a microtubule yj or 
the fluctuating Stokes vector of a semiconductor laser 
field . Our objective is to extract from y all available 
information regarding the dynamical evolution of x(t). 
As mentioned in Section we advocate the Bayesian ap- 
proach for the solution of this problem. Toward this end, 
one has to introduce a parametric model for the dynami- 
cal system and a statistical model for the measurements. 
These elements allow one to incorporate into the solu- 
tion of the reconstruction problem any available a priori 
information on the time-series data (stationarity, embed- 
ding dimension, etc.), as well as expert domain knowl- 
edge (e.g., a theoretical analysis of the physics problem 
at hand). 

The dynamical and measurement equations commonly 
adopted in the context of model reconstruction are 



±{t) = f(x;c)+|(i), 
y{t) - ^{t)+u{t), 



(1) 



with x,y,|,iv e R^, and f : i-^ M^. Here, the first 
equation represents the dynamical model in the form of a 
set of coupled nonlinear Langevin equations with a vec- 
tor field f (x; c) parameterized by unknown coefficients 
c G R*^, and the second equation relates the observa- 
tions to the system trajectory. We assume that the ad- 
ditive dynamical and measurement noise processes ^(t) 



and i>(t) are stationary, white, and Gaussian with 



Mt)) =0, {,.{t)i.^{t'))^enS{t-t'), 



(2) 



where D and e are also typically unknown. Thus, the 
elements {cml rn = 1,2,..., M} of the model coefficient 
vector c, the elements {Dw; 1,1' = 1,2, L} of the dy- 
namical noise covariance (or diffusion) matrix D, and 
the measurement noise intensity together constitute 
the complete set 



M^{c,-b,e} 



(3) 



of unknown parameters. The model reconstruction prob- 
lem, then, is that of inferring the elements of the param- 
eter set M from the measured time-series data 3^. 



B. Bayesian inference 

In Bayesian model inference, two distinct probability 
density functions (PDFs) are ascribed to the set of un- 
known model parameters: the prior ppj:{A4) and the pos- 
terior ppsiM\y), respectively representing our state of 
knowledge about A4 before and after processing a block 
of data y. These two PDFs are related to each other via 
Bayes' theorem 0: 



PpsiM\y) 



Piy\M)pp,iM) 

Jp{y\M)ppAM)dM 



(4) 



Here, the sampling distribution p{y\M.) is the conditional 
PDF of the measurements y for a given choice of the 
model M; it is also referred to, as we do, as the like- 
lihood of given y. Meanwhile, the prior acts as a 
regularizer, concentrating the parameter search to those 
regions of the model space favored by our expertise and 
any available auxiliary information. This initial assign- 
ment of probabilities must, of course, be "coherent" |l7l| . 
i.e., consistent, at least implicitly, with the physics of the 
problem. In practice, Q can be applied iteratively using 
a sequence of data blocks 3^,3^', .. .; the posterior com- 
puted from block y serves as the prior for the next block 
y, etc. For a sufficiently large number of observations, 
Pps{M\y, y, . . .) becomes sharply peaked around a most 
probable model Ai* . 

We note that Bayes theorem allow for a sim- 
ple geometric interpretation and has a clear physical 
sense. Indeed, denote by Ai all the events corre- 
sponding every possible values of the model parame- 
ters (these events are encircled in the figure ^ by the 
solid line). The events corresponding to the measured 
time-series data are shown by X. Then the inter- 
section of the two sets of events can be factories in 
two ways P{X r\M)=P{X\M)PiM)=PiM\X)P{X), 
where P{X\A4) and P{A4\X) are corresponding condi- 
tional probabilities. Taking into account that P{X) = 
/ dMP{X\M)P{M) and dividing last equation by P{X) 
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FIG. 1: Illustration of the geometrical interpretation and 
physical meaning the Bayes theorem. Ai denotes set of all 
possible values of the model parameters. Events correspond- 
ing to the measured time-series data are shown by X. In- 
tersection of the two sets is shown by the shaded area. The 
geometrical interpretation of the Bayes theorem is that the 
likelihood function cuts out of the all possible model param- 
eters those that correspond to the observed time-series data. 



we have Bayes theorem, then geometrical interpreta- 
tion of the Bayes theorem is that the Ukelihood func- 
tion cuts out of the all possible model parameters those 
that correspond to the observed time-series data. It has a 
clear physical meaning that states that given initial guess 
about model parameters it can be improved using results 
of the measurements. This theorem plays fundamental 
role in the modern theory of measurements. 

The main thrust of recent research on stochastic non- 
linear dynamical model reconstruction [3, 0, ITol ITlj 
has been directed towards developing (i) efficient op- 
timization algorithms for extracting the most probable 
model Ai* from the posterior, and (ii) efficient multi- 
dimensional integration techniques for evaluating the 
normalization factor in the denominator of These 
efforts have mostly employed ad hoc expressions for the 
likelihood function (see, e.g., the cost function of Eq. (31) 
in pjj ) ; consequently, the resulting inference schemes fail 
to properly compensate for the effects of noise. In fact, it 
appears that there is a lack in the model-reconstruction 
literature of a closed-form expression (expanded to cor- 
rect orders in the sampling period) for the likelihood 
function of the measurements of a continuous system tra- 
jectory. 

Below we introduce a new approach to Bayesian in- 
ference of stochastic nonlinear dynamical models. The 
method has two key analytical features. Firstly, the like- 
lihood function is written in the form of a path integral 
over the stochastic system trajectory, which includes a 
prefactor that optimally compensates for the detrimen- 
tal effects of (dynamical) noise. Secondly, we suggest 
a novel parameterization of the unknown vector field, 
which renders the inference problem essentially linear for 
a broad class of nonlinear dynamical systems, and thus 
helps us find optimal parameter estimates without ex- 
tensive numerical optimization. These features enable 



us to write an efficient and accurate Bayesian inference 
algorithm for reconstructing models of nonlinear dynam- 
ical systems driven by noise. As a prelude to the formal 
development that follows, the reader may at this point 
wish to review the theory given in Appendix A for the 
maximum-likelihood reconstruction of a one-dimensional 
system model. 



C. The likelihood function 

As we pointed out above, one of the central challenges 
in the inference of stochastic nonlinear dynamical models 
is the derivation of a suitable likelihood function that op- 
timally compensates for the effects of noise. A key ingre- 
dient in this context is the probability density functional 
\^(t)] of finding the system in "state" x(t) at time 
i Ellllll^.'20]- This is supplemented by Pohiy\X) de- 
noting the PDF of observing a time series y for a specific 
realization X ~ {x„} of the system trajectory. Thus, we 
may express the likelihood function very generally in the 
form of a path integral over the random trajectories of 
the dynamical system as 



p{y\M) 



X(tf) 



(ti) 



Pohiy\X)TM[^it)]VMt), (5) 



giving the probabilistic relationship between the obser- 
vations y and the unknown parameters Ai of the model 
Here, we choose ^ to < ^JV =^ so that p{y\M) 
does not depend on the particular initial and final states 
x(ii), x(if ). We note that the path-integral approach has 
also proved useful in nonlinear filtering of random signals 
(see, e.g., |2|) where standard spectral and correlation 
analyses fail. 

The explicit form of JFAw[x(t)] has been given in pH 
l23j|: however, in the context of dynamical inference, 
it is not necessary to employ this exact form as one can 
usually rely on the smallness of the sampling interval. 
Accordingly, adopting a uniform sampling scheme tn = 
to + nh, we assume here for the sake of simplicity that 
h = {t^ — to)/N is small, and rewrite ^ using a mid- 
point Eulcr discretization scheme in the form 



x„+i = x„ + /if(x„; c) + z,-, 



(6) 



where x„ = i (x„+i -I- x„), while z„ are independent, 
zero-mean, Gaussian random variables with covariance 
(z„ z^,) — hi) 6nn' ■ The probability of a particular real- 
ization {z„} of the dynamical noise process is simply 



N-l 

m^n}] = n 

n=0 



dz„ 



[2Trh)L\-D\ 



cxp 



.(7) 



Changing to dynamical state variables using jnj, we thus 
obtain the desired PDF for the dynamical system to 
have an arbitrary trajectory {x„}: 
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^A4[{x„}] = pst(xo) J({x„}) 

X — ^ = exp 



n=0 J{27Th)^\T>\ 



2h 



[x„+i - x„ - /if(x„;c)]'^ D ^ [x„+i - x„ - /if(x„;c)] 



(8) 



where Pst(x) signifies the stationary distribution of x(t), 
and the Jacobian of the transformation is given by 



dxi'r 



N 



nn 

n=l 1 = 1 



h a//(x„;c) 
2 



dxi. 



exp 



N 



- ^tr*(x„;c) 

n=l 



(9) 



approximated to leading order in h, with the elements of 
the matrix $ defined as (x;c) = dfi{x;c)/dxii. 

The evaluation of (jSJ requires, in addition, that 
we adopt a specific form for the measurement PDF 



Poh{y\'^)- We assume here that, for each trajectory com- 
ponent xi{t), the measurement error e is negligible com- 
pared with the fiuctuations induced by the dynamical 
noise; i.e., ^ hDu. Consequently, we may use 



N 



Pobiy\x) ~ II sivn - X„) 



(10) 



Tl = 



in 0, and the set of unknown model parameters to be 
inferred from data reduces to = {c, D}. With this 
substitution, Q is easily evaluated; introducing yn = 
^ (yn+i + Yn); wc Write the result in the form 



--lnp{y\M) - --ftt(yo) + iln(2^/i)+ln|D| 
h 

+ ^ I] {tr*(y„;c) + [y„-f(y„;c)]TD-i[y„-f(y„;c)]}, (11) 

n=0 



where we introduced the "velocity" y„ = (y^+i —yn)/h. 
It is important to note that this likelihood function is 
asymptotically exact in the limit h and N —^ oo, 
with the total observation duration T = Nh remaining 
constant. 

It is the term tr ^(y„; c) in the above that provides op- 
timal compensation for the detrimental effects of dynam- 
ical noise, and distinguishes our likelihood function from 
those introduced in earlier works. Formally, this term 
emerges from the path integral as the Jacobian of the 
transformation from noise variables to dynamical vari- 
ables 113, 113. We emphasize, however, that this is not 
merely a correction term, but is in fact crucial for accu- 
rate inference. In particular, for a small attractor (with 
characteristic length scale smaller then square root of the 
noise intensity) the inference is only possible due to this 
term as will be shown in Section IlIII 

D. Parameterization of the unknown vector field 

As mentioned in Section ^ one of the main difficulties 
encountered in the inference of stochastic nonlinear dy- 
namical models is that the cost function, defined in H15|) 
below, is generally nonlinear in the model parameters. 



thus requiring the use of extensive numerical optimiza- 
tion methods for finding its global minimum. The param- 
eterization we now introduce avoids this difficulty while 
still encompassing a broad class of nonlinear dynamical 
models. Indeed, many of the model reconstruction exam- 
ples considered in earlier works on stochastic nonlinear 
dynamical inference can be solved within this framework. 
Moreover, a large number of important practical appli- 
cations (see, e.g., [l^H^) can also be treated using the 
same approach. 

We parameterize the nonlinear vector field in the form 

f (x; c) = U(x) c, (12) 

where U(x) is an L x M matrix of suitably chosen ba- 
sis functions, and c is an M-dimensional vector of un- 
known parameters. The choice of basis functions is open 
to any appropriate class of (polynomial, trigonometric, 
etc.) functions that may be required for a satisfactory 
representation of the vector field. In general, if we use 
G different basis functions {(j)g{x); g — 1,2,...,G} to 
model the vector field f , then the matrix U will have the 
block structure 
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U = 



/(/.I 
01 



\ 



/</'2 
02 



/0G 



^2 / \ 



' V 





0G / 



(13) 



comprising G diagonal blocks of size L x L [M = Gi), 
with the X dependence suppressed for brevity. An im- 
portant feature of (|12() for our subsequent development 
is that, while possibly highly nonlinear in x, f(x;c) is 
strictly linear in c. 

As shown next, and H12|l are the two main ingre- 
dients that enable an analytic solution to the problem 
of stochastic nonlinear dynamical model inference from 
time-series data. 



E. The algorithm 

We start by choosing a prior model PDF that is Gaus- 
sian in c and uniform in D: 



Ppr(-M) exp 



Substituting (jTJl), (UH), and the likehhood p(3^|7W) given 
by Hll() into Q), we obtain the posterior model PDF in 
the form Pps(A^|3^) = const x exp[— S'j;(c, D)], where 

Sy{c, D) = py{t>) - cT W3;(D) + ^ E.y{t>) C (15) 

is the cost function whose global minimum yields the 
most probable model M* = {c*,D*}. Here, use was 
made of the definitions 



P^(D) - ^ ^y,TD-^y„ + f ln|D| 

n=0 

Af-1 r 



N-1 



(16) 
(17) 
(18) 



ri=0 



where = U(y„), v„ = v(y„), and the components of 
the vector v(x) are 



dxi 



m = 1,2,...,M. 



(19) 



For a given block of data 3^ of length (A^ + 1), the best 
estimates for the model parameters are given by the pos- 
terior means of c and D, which coincide with the global 
minimum of Sy{c, D). We handle this optimization prob- 
lem in the following way. Assume for the moment that 



c is known in 115|l : for the first iteration, take c = Cpj.. 
Then, minimizing S'j;(c, D) with respect to D, we find 
that the posterior distribution over D has a mean 



N-l 



ji=0 



c ( yn - U„ C 



(20) 



Assume next that D is known, and note from (|15|l that 
in this case, the posterior distribution over c is Gaussian. 
Its covariance is given by H3;(D), and its mean 



c) -H-i(D)wj;(D) 



(21) 



(14) H 



minimizes Sy{c^Ti) with respect to c. Thus, for the sec- 
ond iteration, Cpr and Spr are replaced with (c) and 
Hj;(D), respectively. This two-step (analytical) opti- 
mization procedure is continued iteratively until conver- 
gence, which is typically much faster than brute-force nu- 
merical optimization that has been attempted in earlier 
works. 

It is worthwhile to pause here and refiect on the con- 
tent of (|17|l . The first term in the sum is essentially the 
generalized least-squares (GLS) result (see Appendix B), 
and vanishes at the attractors of the dynamical system 
(Q. On the other hand, the second term in the sum 
on the right-hand side of (|17|l . originating from the term 
tr^(y„;c) in Hll|) . does not vanish at an attractor, and 
is in fact crucial for accurate inference in the presence of 
noise. This can be demonstrated analytically by rewrit- 
ing the sum in integral form as 



Wy(D) = SprCpr 



c(to+T) 



(to) 



U[y(i)]^D-My 



2 ./t, 



to+T 



(22) 



It can now be seen that, for an attractor localized in the 
phase space, the first integral will remain finite since the 
initial and final points of integration both belong to the 
attractor. Meanwhile, the second integral in (|22() will 
grow with the duration of observation T . In particular, 
for a point attractor, the first integral is identically zero 
and the second, "compensating" term alone contributes 
to inference. This result is intuitively clear since, in the 
absence of noise, the system will stay forever at the same 
point (i.e., the point attractor) and no structure can be 
inferred. It is the dynamical noise that forces the system 
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to move about in the phase space, thus making it possible 
to infer its structure from time-series data. 

In general, then, both of the integral terms in (|22|l are 
needed to optimally compensate for the effects of dynam- 
ical noise and thus enable robust convergence of our in- 
ference algorithm. The relative importance of these two 
terms will be investigated quantitatively in the following 
section. 



III. INFERENCE EXAMPLES 

We have verified the accuracy and robustness of our al- 
gorithm on several different types of dynamical systems. 
Here, we discuss its performance on two representative 
examples. 



A. The Lorenz system 

We start with the archetypical chaotic nonlinear sys- 
tem of Lorenz, 



ii = cr {X2 - Xi) + ^i(t), 

±2 = rXi-X2~XiX3+S,2{t), 

±3 = XiX2- bx3 + ^3{t), 



(23) 



augmented by zero-mean Gaussian noise processes ^i(t) 
with covariance {£,i{t) {t')) = Du/ S(t — t'). Synthetic 
data (with no measurement noise) were generated by sim- 
ulating using the standard parameter set a — 10, 



28, b 



_ 8 



and for various levels of dynamical noise 



intensities as explained below. The phase portrait of the 
Lorenz system with dynamical noise is shown in Figure |21 
along with the noiseless case to visually convey the diffi- 
culty of the inference problem. 



1. Parameter estimation with strong dynamical noise 

We compare now the performance of our algorithm 
with the results of earlier work jlj]. No attempt was 
made in ^| to identify the model of the system and 
only four unknown parameters were estimated. 

In parameter estimation, the functional form of the 
nonlinear force field - in this case, the right-hand side 
of (|23|l - is assumed known, and the associated coeffi- 
cients are then estimated from data. This is the ap- 
proach reported in where the diffusion matrix is 
taken in the form D = I, and the unknown param- 
eters {a,r,b,T^} are estimated via extensive numerical 
optimization of a cost function by simulated annealing 
and back-propagation techniques. We now demonstrate 
that our algorithm can estimate the parameters of the 
system H23f) extremely efficiently and with very high ac- 
curacy. 

First we notice that since the diffusion matrix is diag- 
onal, our algorithm is reduces in this case to the trivial 




FIG. 2: The phase portrait of tlie chaotic nonlinear Lorenz 
system l|28^ with the standard parameters (see text): (a) de- 
terministic system; (b) stochastic system with strong dynam- 
ical noise, simulated with a diagonal diffusion matrix having 
elements Dn = 1500, D22 = 1600, and D33 = 1700. (All 
quantities in the equations and figures are dimensionless in 
this paper.) 



one-dimensional analytical solution of the problem for 
each equation in the form (cf with ^7^, (UHl), HOI), HU 
and see Appendix A for the details) 



C,; = 



i = l 



where 



Wil 



2 dxi 



and 



ft = E 



n=0 



"ilVil 'Pil'Pi2 



'il4>iL \ 
•i24>iL 



Noise intensity is found according to H2()|) . We note that 
in each equation we now have different basis functions 
4>ii. For the first equation we have the following two 
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basis functions: = xi and (j)i2 = X2- For the second 
equation we have: 4)2i = Xi, <p22 — X2, and 023 = Xix-^. 
And for the last equation we have: ^31 = xiX2, 4'32 = X3. 

Thus there are a total of 8 unknown parameters to 
be estimated: a seven-dimensional coefficient vector c 
and the noise intensity r^. (Note that this is already 
more ambitious than what was done in Wl\ , since we are 
attempting to estimate all model coefficients, including 
those that are equal to ±1.) 

The convergence of our scheme is so rapid that it is 
feasible to use the algorithm in real time on "streaming" 
data. To make a fair comparison we use the same number 
of data points as in . As an indication of the inference 
accuracy, we quote in Table results for data simulated 
with the standard Lorenz parameter set and two values 
of dynamical noise intensity for weak and strong cases. 

Now we turn to an extension of our approach that al- 
lows for efficient identification of the Lorenz system from 
a substantially extended model space with 33 unknown 
parameters. 



2. Model reconstruction with strong dynamical noise 

When the analytical form of the nonlinear force field is 
not known a priori, one may adopt a parametric model, 
as was done in H12(l : in this setting, it is more appro- 
priate to refer to the inference problem as model recon- 
struction. In practical terms, the main difference be- 
tween parameter estimation and model reconstruction is 
in the number of unknown parameters involved, which 
is typically an order of magnitude larger in the latter 
case. This proliferation of unknowns is one of the main 
reasons why inference methods that rely on brute-force 
numerical techniques are rendered largely impracticable 
for model reconstruction. On the other hand, owing to 
its analytical foundation, our algorithm is quite capable 
of handling this more difficult task, as we demonstrate 
below. 



TABLE I: Inference results for the parameters of the system 
(I23I I with weak (first set) and strong (second set) dynamical 
noise. A synthetic data set of 4,000 points was generated for 
each case by simulating the system with a diffusion matrix 
D = I, and subsequently sampling its trajectory with 
h = 0.002. 



Parameter 


Value 


Estimate 


a 


10.00 


9.9916 


r 


28.00 


27.8675 


b 


2.667 


2.6983 


T 


1.00 


0.9965 


a 


10.00 


9.9039 


r 


28.00 


28.3004 


b 


2.667 


2.8410 


T 


40.00 


39.9108 



TABLE IL Inference results for the parameters of the model 
12411 . (For brevity, only a representative subset of the 6;;';" 
and Dill parameters is shown.) Synthetic data, comprising 
200 blocks of 600,000 points each, were generated by simu- 
lating the system with the standard Lorenz parameter set 
and a diagonal diffusion matrix, and subsequently sampling 
its trajectory with h — 0.005. 



Parameter 


Value 


Estimate 


0.11 


1 A nn 
— iO.OO 


1 n 
— iO.oo 


021 


zo.UU 


Zl.OO 


031 


n n 
0.0 


r\ AO 
— U.4J 


0.12 


10.00 


10.77 


022 


1 nn 
— i.OO 


n 1 n /I 
— 0.194 


a-i2 


0.0 


0.596 


ai3 


0.0 


0.065 


023 


0.0 


0.001 


0.33 


-2.667 


-2.759 


bill 


0.0 


0.013 


6211 


0.0 


0.001 


6311 


0.0 


0.018 


6112 


0.0 


0.002 


6212 


0.0 


-0.012 


6312 


1.00 


0.995 


6113 


0.0 


-0.016 


6213 


-1.00 


-0.985 


Dii 


1500.0 


1522.1 


D22 


1600.0 


1621.5 


D33 


1700.0 


1713.4 



We start by considering the data set of Figure|21 where 
the structure of the Lorenz attractor is drastically ob- 
scured by the presence of strong dynamical noise (almost 
2 orders of magnitude stronger then in ^3). We wish 
to fit this data set with a polynomial model of quadratic 
nonlincarity. Toward this end, we introduce a parametric 
model of the form 

3 3 
ii = bwi"Xi>{t)xi,i{t)+^i{t), (24) 

i'=i i',i"=i 

I, I', I" = 1,2,3. Including the elements of the (sym- 
metric) diffusion matrix D, we now have a total of 
33 unknown parameters comprising the set M = 
{{ail'}, {biinii}, {Dill}}. Despite the restriction to linear, 
bilinear, and quadratic polynomial basis functions, l|24l) 
still represents an extremely broad class of dynamical 
models. Assuming no measurement noise for simplicity, 
the application of our algorithm entails the use of equa- 
tions 1(20)1 and (EIJ) with (jTTI) and ijTSJ). The inferred pa- 
rameter values are shown in Tabled it can be seen that, 
even in this case of extremely strong dynamical noise, 
our algorithm succeeds in accurately reconstructing the 
Lorenz model. 



8 




FIG. 3: Results for (a) the posterior mean (i.e., inferred 
value) and (b) the posterior variance (i.e., associated uncer- 
tainty) of the model parameter 021 corresponding to param- 
eter r of the Lorenz system l|24^ as a function of increasing 
observation duration. Curve 1: {Ai} = {0.01,0.012,0.014}, 
h = 0.002; curve 2: {Ai} = {100,120,140}, h = 0.00002. 
The time instant of step-like decrease in the variance is indi- 
cated by the vertical dashed line. 



3. Accuracy of the inferred parameters 

The accuracy of the reconstruction depends on a num- 
ber of factors. As an example, consider the inferred val- 
ues and variances of the Lorenz parameter r as a func- 
tion of the total observation duration, shown in Figure[3| 
for two different levels of noise. Of particular note is a 
sharp, step-like decrease in the variances that occurs on 
the same time scale as the period of system oscillations, 
Tosc — 0.6 (marked by the dashed line in Figure OJ. In 
addition to the total observation duration T, the infer- 
ence error is also sensitive to the values of the sampling 
interval h and the noise intensities Du. For example, for 
the parameters of curve 1 in Figure 13 the relative in- 
ference error was 0.015%. When the noise intensity was 
increased by a factor of 10'* (curve 2 in Figure ISJ, the 
ratio T/h (i.e., the number of data points N) had to be 
increased by at least a factor of 250 to achieve an infer- 
ence error below 1%. 



TABLE III: Inference results for the parameters of the model 
12411 ■ obtained using 200 blocks of 600,000 data points each, 
sampled a.t h = 0.005. True and inferred parameter values 
are shown along with the corresponding error (relative and 
absolute errors for the nonzero and zero parameters, respec- 
tively). The inference error is below 1% for all parameters, 
and much less for most. 



Parameter 


Value 


Estimate 


% error 


ail 


— iU.UUUU 


n nno a 
— 9.9984 


n m f; 1 
U.U161 


021 


20.UUUU 




A n/1 n^i 




U.U 


n nn c o 
— U.UUiJZ 


n 1 on 
— U.oloU 


a2i 


10.0000 


9.9982 


0.0178 


a22 


1 nnnn 
— l.UUUU 


1 nn 1 
— l.UUol 


n K 1 on 
U.512U 


023 


n n 
U.U 


n nno 1 
U.UUol 


n on 'TO 


asi 


n n 
U.U 


n nni a 
U.UU14 


n 1 onn 
U.lo9U 


032 


n n 
U.U 


n nni k 
U.UU1& 


n 1 C yl 
U.lo4z 


133 


-2.6667 


-2.6661 


0.0196 


bill 


0.0 


0.0002 


0.0179 


&211 


0.0 


0.0002 


0.0238 


&311 


0.0 


-0.0004 


-0.0401 


&112 


0.0 


-0.0002 


-0.0208 


&212 


0.0 


-0.0002 


-0.0223 


&312 


1.0000 


1.0006 


0.0607 


&113 


0.0 


-0.0001 


-0.0111 


&213 


-1.0000 


-1.0004 


0.0446 


Du 


0.2867 


0.2865 


0.0587 


D22 


0.4087 


0.4081 


0.1564 




0.5118 


0.5148 


0.5946 


D12 = D21 


0.2052 


0.2049 


0.1442 


Dl3 = D31 


0.1069 


0.1061 


0.7657 


D23 = -D32 


0.1814 


0.1812 


0.1028 



We have observed that it is generally possible to 
achieve arbitrarily accurate inference results with a (suf- 
ficiently small) fixed sampling interval by increasing the 
total duration of observation; this is true even in the case 
of a full (i.e., non-diagonal) diffusion matrix. Indeed, 
we were able to achieve highly accurate parameter esti- 
mates for sampling intervals ranging from 10~^ to 0.01 
and noise intensities ranging from to 10^. As an ex- 
ample, we summarize in Tabic IIIII our inference results 
for the model (|24|) with a full diffusion matrix, showing 
extremely high accuracy. 



4. Optimal compensation for the noise-induced errors 

Finally, we would like to demonstrate the importance 
of the Jacobian prefactor (O included in our likelihood 
function by examining the inference results obtained with 
and without this term. As shown in Figure^ for param- 
eter r of the Lorenz system, the omission of the prefactor 
in the likelihood function results in a systematic under- 
estimation of this parameter, whereas the inclusion of 
this term leads to an accurate inference as it optimally 
compensates for the effects of dynamical noise. 
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FIG. 4: Demonstration of improved inference accuracy due 
to the prefactor Q in the hkehhood function. The true value 
of the parameter being inferred is indicated by the vertical 
dashed line. The sohd and dashed curves respectively show 
the histograms of parameter values inferred by our algorithm 
and by the generalized least-squares method, which lacks the 
Jacobian prefactor. The histograms were built from an ensem- 
ble of 1,000 numerical experiments with 90,000 data points 
each. 



5. Discussion of results 

The Lorenz system provides a concrete example with 
which to emphasize the advantages of our algorithm over 
previous work. We note firstly that we derive the correct 
form of the likelihood function and avoid using of ad hoc 
likelihood function introduced in Furthermore, we 

obtain analytical solution of the problem. This innova- 
tions allow us to estimate model parameters of the Lorenz 
system much faster and more accurately using the same 
number of points as in ,11]. Furthermore, our results un- 
like the results reported in 11] do not depend neither on 
the choice of initial values for the model parameters nor 
on the ad hoc conditions imposed on the analysis of the 
experimental data to exclude points from certain regions 
of the phase space. 

The computational efficiency of our algorithm also al- 
lows us to lift the practical limitation on the total num- 
ber of data points used for inference in previous work. 
(The relatively small number of points (4,000) used for 
inference in was dictated by the complexity of the ex- 
tensive numerical optimization algorithms used therein.) 
In our approach to inference, processing of 0(10^) data 
points takes only a few seconds on a personal computer 
with a 1-GHz CPU, therefore enabling the use of very 
large data sets to achieve arbitrarily accurate model re- 
construction. 

More importantly, the efficiency of our algorithm al- 
lows us to extend substantially the dimensionality of the 
model space. As a consequence it can be efficiently ap- 
plied to deal with a more general problem of model recon- 
struction, when the functional form of a nonlinear vector 



field is unknown 



B. A system of five coupled oscillators 

The limitations of inference algorithms that rely on 
numerical methods for global optimization and multi- 
dimensional integration come into sharper focus when 
systems with large numbers of model parameters are in- 
vestigated. We now wish to illustrate the advantages of 
our algorithm by inferring a model for a system compris- 
ing five locally- and globally-coupled van der Pol oscilla- 
tors with 0(10^) unknown model parameters. 

With K — 5, the system under study is 



Uk 



Vk, 



Vk = ei{l - ul)vk - ujlUk + J2k'=i\kVkk' Uk' 

+ Uk [ikik-l) Uk-l + "/k(k+l) Uk+l] 
+ l^k' = l'^kk' fJ-k' , 

where {iik{t)} are mutually independent, zero-mean, 
unit- variance, delta-correlated, Gaussian noise processes. 
We assume (for simplicity) that there is no measurement 
noise, and that the state is partially observed to pro- 
duce the signal y — [vi V2 vs V4 v^]'^ . The state of the 
system is thus described by the 10-dimensional vector 
X = [ui ... U5 wi ... v^]"^ . We note, however, that 
values of Uk in the model (|25|l are assumed to be know 
and do not have to be inferred. Therefore the problem is 
reduced to the inference of the model parameters of five 
couple equations for Vk , which are parameterized accord- 
ing to the l(T^ with D = aa'^ . 

The phase portrait of this system, projected onto the 
(ui,U2,U3) subspace of its phase space, is shown in Fig- 
ure [S] for some nominal set of model parameters. 

We choose the following basis functions with which to 
reconstruct the model: 



} (25) 



(pk 


= Uk, 


4>k+K 


= Vk, 


k+2K 


= ulvk 




= uf, 


2+3K 


= Ul U2 



^5' 



k = 1,2, . . . , K . Together with the elements of the (sym- 
metric) diffusion matrix D, we thus have a total of 165 
model parameters to infer. We summarize in Table M\ 
the results of our algorithm for the first oscillator, once 
again showing high inference accuracy. Additionally, the 
convergence of the parameters of the fifth oscillator and 
the noise intensities to their correct values is shown in 
Figure El as a function of the amount of data used. 
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FIG. 5: The phase portrait of the system (1251 as pro- 
jected onto the (ui,«2,U3) subspace: (a) deterministic sys- 
tem; (b) stochastic system with a diagonal diffusion matrix 
of the form D — 100 1. (Note the scale change between the 
axes of the two figures.) See Table ITVI and Figure |S] for the 
values of some of the model parameters used in the simulation. 



In order to further highlight the vital noise compensa- 
tion effect provided by the prefactor term in the likeli- 
hood function used in the present work, we compare in 
Figure [71 inference results for one of the coefficients of the 
system (|25|l . ei, obtained with two different diffusion ma- 
trices D and D/4, where the matrix D chosen at random 
to be 



D 



3.9628 2.9636 

2.9636 5.5045 

0.6176 2.7690 

2.4941 5.2893 

2.5068 5.5421 



0.6176 
2.7690 
4.6974 
4.8813 
3.0284 



2.4941 2.5068 

5.2893 5.5421 

4.8813 3.0284 

7.1428 4.6732 

4.6732 7.5784 



(26) 



As discussed earlier, without the Jacobian prefactor lO, 
(|21|l reduces to the GLS estimator. Figure shows that 
the GLS estimator systematically overestimates the value 
of £i; the larger the noise intensity, the larger the system- 
atic error, reaching a few hundred per cent in this case, 
as shown by curves 1' and 2'. On the other hand, when 
the proper Jacobian prefactor is included in the likeli- 
hood function as in Hll|l . we are able to achieve optimal 
compensation of the noise-induced errors, as shown by 



TABLE IV: Inference results for the parameters of the first 
oscillator in the system (I25II . obtained using 50 blocks of 
150,000 data points each, sampled ai h = 0.06. The inference 
error is well below 1% for all parameters. 



Parameter 


Value 


Estimate 


% error 


ei 


-8.40 


-8.4167 


0.2 


wi 


-4.4000 


-4.4031 


0.07 


r]i2 


0.4400 


0.4432 


0.7 


'713 


-0.60 


-0.6033 


0.54 


Vii 


0.96 


0.9625 


0.3 


Vis 


0.80 


0.8022 


0.3 


7l2 


-0.480 


-0.4806 


0.1 


715 


0.8 


0.8013 


0.2 


Qii 


0.20 


0.2020 


1.0 




o 


£5 




"5 





Vai 


V 








< 


'754 




150 200 250 300 350 



FIG. 6: Accurate inference of (a) the parameters of the fifth 
oscillator in the system (I25II and (b) the elements of the last 
row of the diffusion matrix Q. The horizontal axes show the 
number of blocks of data used, with 800 points in each block, 
sampled at h — 0.02. 



curves 1 and 2 obtained with the same noise intensities. 
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FIG. 7: Further demonstration of improved inference accu- 
racy due to the prefactor Q in the hkehhood function. The 
true value of the parameter being inferred is indicated by the 
vertical dashed line. Histograms 1 and 2 show results ob- 
tained with our algorithm, while histograms 1' and 2' are due 
to the GLS method, showing the detrimental effect on infer- 
ence accuracy of the missing prefactor. The diffusion matrix 
was Q for curves 1 and 1', and 2Q for curves 2 and 2' (see 
text). 

IV. DISCUSSION 

In this paper, we introduced a novel technique for in- 
ferring the unknown parameters of stochastic nonlinear 
dynamical systems from time-series data. The key fea- 
tures of our approach are 

• a likelihood function written in the form of a path 
integral over stochastic system trajectories, prop- 
erly accounting for measurement noise and opti- 
mally compensating for dynamical noise; and 

• a parameterization of the unknown force field that 
renders the inference problem essentially linear, de- 
spite the strong nonlincarity of the model itself. 

Specifically, our analytical derivation produces the cor- 
rect Jacobian prefactor in the likelihood function, which 
was missed in earlier works. Meanwhile, the represen- 
tation of the system nonlinearity as an expansion over 
a set of basis functions provides stable and robust in- 
ference for a broad range of dynamical models. These 
features enabled us to devise a highly accurate and effi- 
cient Bayesian inference algorithm that can reconstruct 
models of stochastic nonlinear dynamical systems with- 
out resorting to brute-force numerical optimization. 

We illustrated the advantages of our approach by ap- 
plying it first to the inference of the stochastic nonlinear 
dynamical system of Lorenz. In the context of parameter 
estimation with 8 unknown parameters, we showed that 
the accuracy and efficiency of our algorithm exceed those 
achieved (under similar conditions) in earlier works. We 
also demonstrated that our algorithm can deal with the 



Lorenz system in the more general setting of model re- 
construction, i.e., assuming no knowledge of the func- 
tional form of the nonlinear vector field. Although a 
much larger number of 33 unknown parameters were in- 
volved here, our algorithm was still able to achieve a high 
inference accuracy. 

In order to further illustrate the strengths of our al- 
gorithm, we applied it next to a system of five coupled 
nonlinear noisy oscillators. Using a set of polynomial ba- 
sis functions for the nonlinear field and a full covariance 
matrix for the dynamical noise, the model comprised 165 
unknown parameters, all of which were inferred within an 
error of 1% from a data set of 10^ points, taking only a 
few seconds on a personal computer of average comput- 
ing power. These demonstrations, we believe, are con- 
vincing representation of the capability of our approach, 
in both accuracy and efficiency, for reconstructing models 
of stochastic nonlinear dynamical systems. 

Furthermore, the efficiency of our algorithm has en- 
abled us recently to identify a stochastic nonlinear model 
of coupled cardiovascular oscillators using univariate 
physiological times series data ,.29.] thus opening a new 
venue for a broad range of important interdisciplinary 
applications. 

Several simplifying assumptions were made here to 
provide a clear description of the algorithm in its barest 
form. Although the examples of Section IIIII dealt with 
noiseless measurements, we have indicated in Section Hll 
how measurement noise can be included systematically 
in our inference algorithm. These examples also required 
the use of polynomials only, but the approach is in fact 
completely flexible regarding the type of basis functions 
used to model the nonlinear force, including time de- 
lays. Additionally, the path- integral technique used here 
to derive the likelihood function allows for a number of 
straightforward generalizations of our algorithm to the 
reconstruction of models with colored and multiplicative 
(or parametric) dynamical noise, and arbitrary (i.e., not 
necessarily uniform or short) sampling intervals. Finally, 
although the basic theory of Section ^ was developed 
under the implicit assumption that all dynamical vari- 
ables are available for measurement, we have shown in 
Section Fill Bl that our algorithm is able to reconstruct a 
complete model from partial measurements of the sys- 
tem trajectory. Since it is often not feasible to measure 
all degrees of freedom in practice, a generalization of our 
algorithm to deal with "hidden variables" will be very 
useful. These extensions will be explored in subsequent 
publications. 
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Appendix A 
Maximum-likelihood parameter estimation 
for a one-dimensional system 

Consider a one-dimensional stochastic nonlinear dy- 
namical system 



(27) 



where ^(t) is a zero- mean Gaussian noise process with 
{£.{t)£,{0)) = D5{t). The mid-point approximation to 
(|27|l on the time lattice {i„ = io + 'n-h, n = 0, 1, . . . , iV} is 



(28) 



where we used x„ = x{tn) and Xn = \ [xn+i + Xn), and 
2;„ = J^*""'"'' ^(i) form a sequence of zero-mean Gaus- 
sian random variables with {z^ z^') — hD5nn'- 

The probability of realization of a particular random 
sequence {zn\ is 



Af-l 



m^n}] = n 



dz„ 



n=0 



\/2TrhD 



exp 



2/i7:> 



(29) 



Using the Markovian property of x{t) and the transfor- 
mation rule p{{zn}) n«dz„ = pi{xn}) Hnda^n, along 
with (|28|l and H29|) , we find the probability density of the 
dynamical system trajectory to be 



pi{Xn}) = Pst{xo)Ji{Xn})i27ThD)-^/^ 
N-1 



n 



X I I exp 

n=0 



2D 



(30) 



wherein = (a;„+i— x„)//i, and the Jacobian of the trans- 
formation, to lowest order in h, is 



j({x„}) ^ n 

n=0 



exp 



AT-l 

'2 ^ 

n=0 



f'{Xn) 



Here, prime indicates differentiation with respect to ar- 
gument. Thus, we obtain for the negative-logarithm of 
(|30|l the expression 



N 

S = -lnpst(a;o) + y ln(2^/iD) 



n=0 ^ 



D 



!(x^)f 



(31) 



Assume now that we observe a time series \xn\^ and 
wish to reconstruct a one-dimensional stochastic nonlin- 
ear dynamical model for the system that generated the 
data; i.e., infer the form of the nonlinear function /(x) 
and estimate the noise intensity D in (|27|l . A fruitful ap- 
proach to this problem is to model the nonlinearity as a 
linear superposition of a set of nonlinear basis functions: 



M 
m— 1 



'u(2;). 



(32) 



The maximum-likelihood (ML) estimates for the un- 
known model parameters c and D are then furnished by 
the global minimum of S. Thus, setting dS/dD — and 
passing to the limit /i ^ with T = Nh, we find 



1 r*"+^ 2 
D^— fi - u(x)] dt. 



(33) 



Next, substituting 1)32(1 into ((31(1 and rearranging, we ob- 
tain 5' = p — c"'' w -t- i c""" H c, where 



N 1 

p = - \np,t{xo) + — H2TThD) + — 



to+T 



x^ dt, 



to 



to+T 



'to 
1 

D 



1 . , , 1 du(x) 



dt, 



to+T 



u(x) u^{x) dt. 



The condition dS/ dc = now gives 



(34) 



The ML estimates are found by iterating ((33() and l(34() 
to convergence. 

In Section this theory is extended to deal with 
multi-dimensional system models and to include prior 
information on model parameters; it is particularly in- 
teresting to contrast the results above with our main al- 
gorithm given in Section III El 



Appendix B 
The generalized least-squares estimator 

It is insightful to contrast the algorithm presented in 
this paper with the generalized least-squares (GLS) es- 
timator. Starting again with the system we neglect 
measurement noise, adopt the parameterization of 1(12(1 . 
and apply the mid-point approximation, obtaining 



y„ = U„c + Cn, n = 0,l,...,N-l, 



(35) 



where, as before, we introduced y„ = (yn+i — yn)/h and 
U„ = U(y„) with y„ = i (y„+i +y«). The vectors {C„} 
satisfy 



iCn 



We may arrange the N equations contained in ((35(1 into 
a single partitioned matrix equation as 



where 



H 



d = H7 + n. 



Uo 

6 Ui 



(36) 





6 







u 
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7 is a column vector comprising N copies of the unknown 
model coefficient vector c, and d = [yo Yi ■ • ■ yAr-i]""" 
and n = [il'o Ci • ■ ■ CAf-i]""" are composite data and noise 
vectors, respectively, the latter having zero mean and a 
covariance matrix of the form 

" D 6 ... 6 ' 
1 6 D ... 6 

.6 6 . . . D . 

Now, the GLS estimator for the vector 7 in (|36|l is 
given by (see, e.g., 1231) 

■y ^ (h^ A-^ H}j ^il^A^d. (37) 

Using the diagonal forms of the matrices H and A, we can 



extract from H37() the following estimate for our model 
coefficient vector: 

(N-l \ ~1 Af-1 

^UTD-iuJ 5]UTD-Iy„. (38) 
ri=0 / Ji=0 

A comparison of H38|l with our corresponding result 
(|21|l is facilitated by an examination of the definitions 
(|17|l and (|18|l . whereupon it is seen that, in the absence 
of prior information (i.e., Spr 0), the only difference 
between the two estimates is the additional term ^ v„ 
in our expression. The importance of this extra term is 
borne out by the examples given in Section IIIII where 
it is observed that the GLS estimator leads consistently 
to grossly inaccurate parameter estimates, while our al- 
gorithm succeeds in achieving arbitrarily high inference 
accuracy. 
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